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Abstract 

Ernst Mayr’s (1961, Science 131: 1501-1506) distinction between proximate and ultimate causation in biology is 
examined with regard to the acquisition of understanding in biological systematics. Rather than a two-part distinction, 
understanding in systematics is characterized by relations between three explanatory components: descriptive 
(observation statements)—proximate (ontogenetic hypotheses)—ultimate (e.g. specific and phylogenetic hypotheses). 
Initial inferential actions in each component involve reasoning to explanatory hypotheses via abductive inference, 
providing preliminary understanding. Testing hypotheses, to critically assess understanding, is varied. Descriptive- and 
proximate-level hypotheses are routinely tested, but ultimate hypotheses present inherent difficulties that impose severe 
limits, contrary to what is usually claimed in the systematics literature. The problem is compounded by imprecise 
considerations of‘evidence’ and ‘support.’ For instance, in most cases, the ‘evidence’ offering ‘support’ for phylogenetic 
hypotheses, as cladograms, is nothing more than the abductive evidence (premises) used to infer those hypotheses, i.e. 
character data and associated phylogenetic-based theories. By definition, such evidence only offers initial, trivial 
understanding, whereas the pertinent evidence sought in the sciences is test evidence, which cannot be supplanted by 
character data. The pursuit of ultimate understanding by way of spurious procedures such as contrived testing, Bremer 
support, and resampling methods are discussed with regard to phylogenetic hypotheses. 
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Every hypothesis should be put to the test by forcing it to make verifiable predictions. A hypothesis on which no 
verifiable predictions can be based should never be accepted, except with some mark attached to it to show that it is 
regarded as a mere convenient vehicle of thought—a mere matter ofform. 

Peirce (1935: 5.599) 

Systematics is on a dangerous path towards irrelevancy to the remainder of biology because meaningful dialogue or 
assessment is no longer attempted, and is essentially impossible. 

Mooi and Gill (2010: 27) 


Introduction 

The subject of causation is probably the most fundamental consideration in all the sciences, as it is the continual 
desire of scientists to acquire understanding of the phenomena we encounter (Hempel 1965; Rescher 1970; Popper 
1983, 1992; Salmon 1984a; Van Fraassen 1990; Strahler 1992; Mahner & Bunge 1997; Hausman 1998; de Regt et 
al. 2009). Such understanding entails the interplay between our activities of explanation and prediction in 
conjunction with available hypotheses and theories. For instance, de Regt and Dieks (2005: 150, emphasis original) 
define the ‘criterion for understanding phenomena’ as, “A phenomenon P can be understood if a theory T of P 
exists that is intelligible (and meets the usual logical, methodological and empirical requirements ).” Regarding 
biology specifically, Leonelli (2009: 197, emphasis original) characterizes understanding as "the cognitive 
achievement realizable by scientists through their ability to coordinate theoretical and embodied knowledge that 
apply to a specific phenomenon .” A notable exegesis on causation and understanding in biology was provided by 
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Mayr [1961: 1503, see also 1982, 1993, 1994; a similar perspective was independently developed by Tinbergen 
(1963) with respect to ethology], in which he distinguished proximate and ultimate causation: 

...proximate causes govern the responses of the individual (and his organs) to immediate factors of the 
environment while ultimate causes are responsible for the evolution of the particular DNA code of 
infonnation with which every individual of every species is endowed. 

Beatty (1994: 334) summarized Mayr’s distinction as follows: 

The proximate causes of an organism’s traits occur within the lifetime of the organism. They involve the 
expression of the infonnation contained in the organism’s genetic material, as mediated by the 
environment. The ultimate causes occur prior to the lifetime of the organism, within the evolutionary 
history of the organism’s species. 

In his recent treatment of the subject, Ariew (2003) provided some crucial clarifications. In rightly removing 
Mayr’s emphasis on DNA and infonnation, Ariew (2003: 555) characterized proximate causation more generally 
as “the causal capacities of structural elements” that are part of the life history of an organism. But in 
contradistinction to ultimate causation, Ariew suggested that what is really being referred to is ‘evolutionary 
explanation,’ which subsumes not only natural selection, which was the only cause referred to by Mayr (1961), but 
other causes as well, such as mutation, recombination, and genetic drift. And while evolutionary explanations 
would serve to address causal questions regarding the properties of organisms, Ariew (2003: 558, 560) saw these as 
“statistical population-level” explanations: “Evolutionary explanations range over statistical attributes of a 
population...” The intent was to counter the argument that ultimate explanations can be reduced to a series of 
individual-level proximate explanations. 

The purpose of the present paper will be to examine Mayr’s (1961) proximate-ultimate distinction from the 
perspective of the acquisition of causal understanding in biological systematics. Systematics addresses causal 
questions that span proximate and ultimate explanatory realms, with the latter consisting of several pertinent 
classes of explanations. While there is a broad range of causes that can be regarded as evolutionary, I will not 
follow Ariew (2003) in grouping all ultimate causes under that term. Although ultimate explanations entail a 
variety of hypotheses, some of which are presented below, two of the most prominent in systematics are specific 
(i.e. species taxa; Fitzhugh 2005b, 2009) and phylogenetic (cladograms, or supraspecific taxa; Fitzhugh 2008b). 
The issue to be addressed in this paper is the extent to which ultimate explanations in biological systematics not 
only lead to initial causal understanding but also result in further, critical assessments of that understanding, per the 
goal of scientific inquiry. In other words, in what capacity do such hypotheses provide understanding, and how 
pervasive is empirical support for or against that understanding as a consequence of testing? 1 What I will point out 
is that ultimate explanations in biological systematics are typically marginal vehicles for understanding. They are 
‘explanation sketches’ as characterized by Hempel (1965: 423^124). These sketches, either as species 
‘descriptions’ or graphic representations of phylogenetic hypotheses referred to as cladograms, are rarely ever 
filled out as full explanations amenable to the acts of testing so often claimed in the systematics literature (e.g. 
Wiley 1975; Gaffney 1979; Eldredge & Cracraft 1980; Rieppel 1988; Faith & Cranston 1992; Kluge 1997a, 1997b, 
1999, 2001; Grandcolas eta!. 1997; Siddall & Kluge 1997; Wenzel 1997; Schuh 2000; de Queiroz & Poe 2001, 
2003; Farris et al. 2001; Faith & Trueman 2001; Faith 2004, 2006; Wheeler 2004, 2010; Franz 2005; Helfenbein & 
DeSalle 2005; Egan 2006; Grant & Kluge 2008; Schuh & Brower 2009; Faith et al. 2011; Wiley & Liebennan 


1. To be clear from the start, my reference to hypothesis testing is only in reference to explanatory hypotheses, not statistical 
hypotheses. If the principle goal of scientific inquiry is to extend causal understanding, then there is the expectation that a 
historical science like systematics should strive for that goal. The pursuit of causal understanding begins with one’s 
reactions to observations in the form of why-questions regarding observed states of affairs that are unexpected or 
surprising. We infer explanatory hypotheses that suggest possible past causal conditions, serving as answers to those 
questions. As such, cladograms qua topologies are not explanatory hypotheses. Rather, they are diagrams implying a 
variety of explanatory hypotheses regarding past causal events. Cladograms therefore are not statistical constructs. While 
there are a host of methods that are commonly implemented under the guise of ‘testing’ cladograms, e.g. bootstrap, 
jackknife, permutation and likelihood ratio tests, none of these are addressing the actual explanatory hypotheses to which 
hypothesis testing would be directed. 
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2011). Rather than promoting increased ultimate (evolutionary) understanding through the testing of specific and 
phylogenetic hypotheses, the tendency is for investigators to revert to the pursuit of enhancements of descriptive 
aspects (observation statements) regarding organisms or some proximate explanations. Two complicating issues, 
which have been addressed elsewhere, are the following: (1) the lack of emphasis on the causal questions that 
prompt the inferences that serve as answers (Fitzhugh 2006a c), and (2) the tendency to confuse those inferences 
with the testing of resultant hypotheses (Fitzhugh 2006a, 2008a, 2010a; e.g. Schmidt 2009). The consequence has 
been methodological mischaracterizations of those inferences and hypothesis testing under such headings as 
‘parsimony,’ ‘maximum likelihood,’ and ‘Bayesianism’ (e.g. Siddall & Kluge 1997; Faith & Trueman 2001; Flaber 
2005; Schmidt 2009), among others. 


Biological understanding: descriptive, proximate, ultimate 

Mayr (1961) characterized biology as two separate fields, functional and evolutionary, seeking proximate and 
ultimate causes, respectively. But Mayr (1961: 1501) also recognized a third field, “purely descriptive structural 
biology.” In terms of either proximate or ultimate causal hypotheses, these would follow from one’s description(s) 
of the perceived effects in need of explanation, i.e. the properties of organisms. What is interesting is that while 
Mayr downplayed description in lieu of proximate and ultimate explanations, the inferences, and subsequent 
communication of our observation statements do in fact serve the purpose of explanation as well. Observation 
statements are not theory neutral constructs (Hanson 1958; Popper 1992; Godfrey-Smith 2003). We receive sense 
data as a consequence of interactions with objects. To those data we apply any variety of theories to infer concepts 
and statements that accord us some degree of understanding of those perceptions, providing bases for subsequent 
actions. For instance, my sense perceptions of a group of objects might lead me to conclude that “This is a glass of 
water.” The conclusion is inferred by that class of non-deductive reasoning known as abduction, wherein effects 
are conjoined with one or more theories to infer a tentative cause (Peirce 1878, 1931, 1932, 1933a, 1933b, 1934, 
1935, 1958a, 1958b; Hanson 1958; Achinstein 1970; Fann 1970; Reilly 1970; Curd 1980; Nickles 1980; Thagard 
1988; Josephson & Josephson 1994; Hacking 2001; Magnani 2001; Psillos 2002, 2007; Godfrey-Smith 2003; 
Norton 2003; Walton 2004; Aliseda 2006; Fitzhugh 2005a, 2005b, 2006a, 2006b, 2008a-c, 2009, 2010a; Schurz 
2008). Abductive inference can be schematized as: 

[1] • auxiliary theory(ies) 

• theory(ies) relevant to the effects perceived 

• perceived effects 


• explanatory hypothesis, H. 

While uttering “This is a glass of water” is an observation statement, per my current understanding of theories such 
as glass and water, the statement also serves the purpose of explaining why my perceptions are the case—it is the 
existence of the objects referred to as glass and water that are the causes of my sense perceptions (Schurz 2008). 

In considering the nature of understanding in biological systematics, it would be neglectful not to include 
descriptions as a class of causal understanding fundamental to proximate and ultimate understanding. If it is the 
case that the goal of science is to engage in a continual process of acquiring causal understanding by way of initial 
abductive inferences and subsequent testing of hypotheses and theories, then observation statements are the 
fundamental starting points that lead to the pursuit of proximate and ultimate understanding (cf. Mayr 1982). Like 
observation statements, which provide descriptive understanding, proximate and ultimate understanding also 
originate as products of abductive reasoning, differing only in the respective sets of theories employed in each. 
Extensive treatments of the nature of abductive inference in systematics can be found in Fitzhugh (2005a, 2006a, 
2006b, 2008a, 2008b, 2009, 2010a). Examples of specific and phylogenetic inferences will be provided later in 
relation to testing such hypotheses. 

With explanatory hypotheses providing initial understanding of sets of objects and/or events, serving as 
answers to specifiable causal questions, subsequent testing would serve to assess and potentially expand or revise 
that understanding through confirming evidential support, or hypothesis revision or replacement as results of 
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disconfmning evidence. In its simplest fonn, testing of explanatory hypotheses would first involve deductive 
inferences to predictions of potential test evidence in the form of consequences that should be encountered if the 
cause-effect relations stated by a theory as well as initial conditions presented in the hypothesis are the case 
(Schurz 2008; Fitzhugh 2010a): 

[2] • auxiliary theory(ies) 

• theory(ies) relevant to the effects perceived 

• specific causal conditions presented in explanatory hypothesis, H 
{ex [1]) 

• proposed conditions needed to carry out test 


• perceived effects (originally prompting//; cf. [1]) 

• predicted test evidence, i.e. effects related as closely as possible 
with the specific causal conditions of the hypothesis. 

Ideally, this potential test evidence should consist of effects with the lowest probability of occurrence if the causal 
conditions stated in the hypothesis did not transpire (Peirce 1958a; Mayo 1996; Achinstein 2001; Fitzhugh 
2010a) 2 —what Cleland (2001, 2002, 2011a, 2011b) referred to as ‘smoking gun’ evidence. And by the very nature 
of the deduction from the specific causal events stated in the hypothesis, test evidence would be a class of effects 
independent of that upon which the hypothesis was originally inferred (Popper 1992; see also Tucker 2011). 
Pursuant to deriving predictions, testing (inductive sensu stricto) would be performed to detennine whether or not 
observed test consequences support the hypothesis: 

[3] • auxiliary theory(ies) 

• theory(ies) relevant to the effects perceived 

• actual test conditions 

• actual confmning/disconfirming evidence (observations of predicted 
test evidence in [2]/alternate observations) 


• H is confirmed/disconfirmed. 

As will be noted later, the systematics literature too often fails to provide cogent distinctions between our 
observations of organisms we wish to explain by way of past evolutionary events {qua abduction—[1]) and the 
potential or actual evidence needed (via either de- or induction—[2], [3]) to engage in the empirical evaluations of 
those hypotheses. Failure to recognize the abductive nature of systematics inferences that serve as answers to either 
implicit or explicit causal questions, and the proper mechanics of testing, has led to erroneous efforts that conflate 
hypothesis inference with testing as well as other spurious notions of assessing hypothesis support (e.g. Schmidt 
2009). 

With this interplay between the inferences of hypotheses and their being tested, Mayr’s three classes of 
understanding can be brought to the context of systematics and summarized as follows: 

• Descriptive understanding. Critical assessment over time of answers to questions regarding properties 
(effects) instantiated by organisms; e.g. “What accounts for my sense perceptions of this organism in my visual 
field as opposed to some other object?” The question entails observations for the purpose of adding to one’s 
repertoire of properties that characterize those objects, with observation statements serving as perceptual 
hypotheses answering the questions. There is the subsequent process of testing those hypotheses by way of 
investigations into component parts that make up the more inclusive properties of the objects. 

• Proximate causal understanding. Critical assessment over time of answers to questions regarding 
properties (effects) instantiated by organisms at a moment in their life history as opposed to some other time. 


2. Reference here to low probability of test evidence is not equivalent to the ‘improbable evidence’ of character data in rela¬ 
tion to alternative hypotheses as suggested by Faith (2004, 2006), Faith and Cranston (1991, 1992), Faith and Trueman 
(1998, 2001), and Faith et al. (2011). As will be described later, no amount of character data used to infer cladograms can 
serve the added purpose of then testing those hypotheses. 
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Answers are by way of proximate causes, i.e. explanations of manifestations of characters by way of intrinsic and 
extrinsic causes occurring during the lifetime of an organism (Beatty 1994; Ariew 2003); e.g. ontogenetic 
hypothesis: “Why does this individual have property Y at this stage of its life history as opposed to property X that 
occurs at another stage?” An ontogenetic hypothesis regarding manifestation of Y is provided. Continued 
evaluation of this proximate understanding would be by the consequences of testing the hypothesis that served as 
the initial answer. 

• Ultimate causal understanding. Critical assessment over time of answers to questions regarding properties 
(effects) instantiated by groups of organisms. Answers are by way of intrinsic and extrinsic causal processes 
occurring among multi-generational groups of individuals, resulting in differential expressions of properties; e.g. 
phylogenetic hypothesis: “Why do individuals to which species hypotheses b-us and c-us apply have character Y as 
opposed to character X as observed among members of a-us, x-us, etc.?” A phylogenetic hypothesis regarding 
character Y origin and fixation in an ancestral population and subsequent population splitting (‘speciation’) can be 
provided. The continued enhancement of this ultimate understanding would be by the consequences of testing the 
various causal components in the hypothesis that served as the initial answer. 


Initial understanding in biological systematics 

Segregating understanding into three distinct classes when speaking of biological systematics provides a way to 
indicate the internested relations that exist in systematics research. In other words, there tends to be a three-tier 
system of inquiry: (1) descriptive understanding, in the form of observation statements, is a consequence of our 
desire to attain understanding of sense data by way of the existence of objects with particular properties, and that 
level of understanding naturally leads to (2) sets of questions answerable by proximate causes, given that we 
routinely observe individuals at different points in their life history, and (3) with observations among groups of 
organisms there are additional questions to which a variety of ultimate causes can be applied. 

Hennig (1966: fig. 6) stressed the importance of recognizing the nuances that exist among various biological 
systematics hypotheses (Fig. 1). He noted that systematists deal with at least seven classes of hypotheses within a 
spectrum of understanding transcending descriptive, proximate, and ultimate contexts. These hypotheses are 
distributed accordingly (Fig. 2): 

descriptive —individual (semaphoront) 
proximate —ontogenetic 

ultimate —tokogenetic, cyclomorphic, sexual dimorphic, polymorphic, specific (species), and phyloge¬ 
netic. 

As indicated in the previous section, all of these hypotheses are products of abductive reasoning, as responses to 
implicit or explicit questions regarding one’s perceptions of individuals (Fitzhugh 2005a, 2005b, 2006a-c, 2008a- 
c, 2009; Table 1). 


Continued understanding in biological systematics 

The relations between the classes of understanding presented above tend to be operationally internested from the 
perspective that our opportunities to test hypotheses that are (1) descriptive, (2) proximate, and (3) ultimate, 
respectively, tend to be increasingly more restricted and thus less frequent (Fitzhugh 2010a). Compare, for 
instance, the ability to (1') test the descriptive hypothesis that I observe a lizard with three toes (I—III, as opposed to 
II-IV) in contrast to four toes (I-IV), to (2') the proximate hypothesis explaining by way of ontogeny the presence 
of toes I—III in this adult, to (3') an ultimate (phylogenetic) hypothesis explaining the presence of toes I-III in 
contrast to I-IV among individuals to which several different species hypotheses apply. Assessing the observation 
statement would minimally require (1") observations of the expected skeletal components that comprise I-III (as 
opposed to II-IV), while testing the ontogenetic hypothesis would require (2") the commitment of time and 
resources to observe causal relations among limbs and toes within individuals over some period of time during 
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their life history. Testing the phylogenetic hypothesis would be the most challenging, as it would require (3") 
access to test evidence not only regarding the specific causal events associated with origin and fixation of the three- 
toe condition in an ancestral population, but also evidence of the cause(s) that led to subsequent splitting(s) of the 
population(s), colloquially referred to as ‘speciation.’ 
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FIGURE 1. Hypotheses commonly encountered in biological systematics. Modified from Fitzhugh (2008b: fig. 1) based on 
Hennig’s (1966) figure 6. 


Assessing our causal understanding in biological systematics, whether descriptive, proximate, or ultimate, is 
determined by the extent to which hypothesis testing, sensu [3], is accomplished. While the act of inferring an 
explanatory hypothesis provides initial understanding, in that the hypothesis serves as an answer to at least one 
specifiable question, it is a hallmark of the sciences that engaging in critically evaluating such hypotheses allows us 
to not only gauge the current status of understanding but to alter or revise that understanding over time. But as 
noted earlier, the ability to test biological systematics hypotheses becomes progressively more difficult as one 
proceeds from descriptive, to proximate, to ultimate explanations. A prominent limiting factor for the testing of 
higher-level ultimate hypotheses is the span of time between those effects that prompt inferences of hypotheses and 
the causal events themselves. The greater the span of time from a hypothesized cause(s) and observed effects the 
more likely will be the eradication of relevant evidence required to test those hypotheses (Cleland 2011a). The 
consequence of the inherent difficulty with testing ultimate explanations is that two of the most prominent classes 
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of evolutionary hypotheses, specific and phylogenetic (cf. Fig. 1), are almost never legitimately tested, such that 
there is no actual enhancement of understanding via hypothesis revision or replacement on the basis of empirical 
assessments of causal relations. Rather, once ultimate hypotheses have been inferred, the tendency among 
systematists is to direct focus back to the inferences of additional or revised descriptive and proximate hypotheses, 
if at all, in a purported attempt to refine ultimate understanding qua testing (see below). For instance, Flennig 
(1966: 122, emphasis added) mistakenly considered this maneuver as a process of testing phylogenetic hypotheses: 

Thus the question of whether kinship relations based on a single character or a single presumed 
transformation series of characters correspond to the actual phylogenetic relationships of the species is 
tested by means of other series of characters: by trying to bring the relationships indicated by the several 
series of characters into congi-uence. In the final analysis this is again the method of “checking, 
correcting, and rechecking”.... 
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the objects we perceive Descriptive explanations 
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FIGURE 2. Relations between systematics hypotheses (cf. Fig. 1) utilized in descriptive, proximate, and ultimate explanatory 
contexts. 


Likewise, the more characters involved in this process the better (Flennig 1966: 132, emphasis added): “For 
phylogenetic systematics this means that the reliability of its results increases with the number of individual 
characters that can be fitted into transformation series.” Rindal and Brower (2011: 331; see also Brower 2006, 
2010) go so far as to make the claim that character congruence on cladograms is the only means of phylogenetic 
assessment. Unfortunately, in the nearly 40 years since Flennig’s statements, the emphasis on congruence or 
acquiring more character data under the guise of testing has grown in prominence, most notably through the 
peculiar attempts to ally this activity with Karl Popper’s notions of corroboration or falsification (e.g. Wiley 1975; 
Gaffney 1979; Eldredge & Cracraft 1980; Rieppel 1988; Kluge 1997a, 1997b, 1999, 2001; Siddall & Kluge 1997; 
Farris et al. 2001; Faith & Trueman 2001; Faith 2004; de Queiroz & Poe 2001; Lee & Camens 2009; Wiens 2009; 
Faith et al. 2011). As will be outlined in the next section, the result has been the development of a cycle of at best 
minimizing, and at worst impeding causal understanding as a consequence of conflating the inferences of 
evolutionary hypotheses with their being tested. 


46 • Zootaxa 3435 © 2012 Magnolia Press 


FITZHUGH 






TABLE 1. Comparisons of perceptual, ontogenetic, tokogenetic, ‘intraspecific’ specific, and phylogenetic hypotheses (modified from Fitzhugh 2008b). See Figure 1 for graphic representations of 
each hypothesis. 

Causal questions: Relations: Represented bv: 


O 


g 

33 


G 

< 

I 


O 

33 


o 

X 

£ 


1 

G ^ 

2 ”2 
o 5 

-G „ 

11 
<d eg 
<*> 


o 

1 

CL 


G 

£ 

o 

It- 

5 eg 

eg 

G O 

3 > 
-5 2 

■2 eg 


o 

cS o 
o 2 

33 co 


jg 

G 

CL 

O 

CL 


•Jg 

O 

G 

o 

o 

cl 


Cl 

cL 

b 


£g 

b 

-s; 

•2 

33 

•2 

L 

& 

R 

If 

2 

I 3 

Si 

eu 

o 

s 

co 03 

2 ’g 1 

o 

0, 

o 

h- 2 

K 


00 

o 

o 


eg 

CL 


G 

33 

O 


eg 

03 

-G 


G 

_o 

tg 

o 

O 

—' eg 

g 

4= 3 
£ .> 
eg £3 

§- a 

eg 

CO ^ 


eg 

33 

03 

> 

s- ■- 
0) c-. 

CO C 

o| 

co eg 


eg 

G . 

3 


£ 

& 


03 

X 


§ 

£ 


£ 
-G 

o ^ 
G 
O 


o 

CO 


o 

00 

G 

O 


e5 

43 


3 

[> 

33 

G 


eg o- 

G ^ 

3 ^ 

x 't 

G 3 

eO to 

5 2 

CO c 

<u o 
o o 
■a c 
b'2. 


£* 

£ 

3 

CL 


G 

03 

£ 

' CL 


SS 


G ^ 
-G 33 

»“ S 

§ s 

.> eg 

"g ^ 
•- <5 

J- 4-> 

® 8 
,_T Lh 

eg eg 

G -G 
T3 O 

> J 

33 ^ 


s * 

H o 


* 3 

1 s 


CL) _ 

o «g 


£g 

5 <u 


3 

eg 

8 3 
43 -3 
O 3 
2 > 
&•? 
I 


3 o 
S 3 


g G 

3 £ 


O 

CL 

I 


IS 


L> ^ 
Q X 
• O 


■ eg 
G 


CO 


o 

^ -G 3 
03 00 ^ 


* G 




o 

o 


O 03 

•§ s M 
X 

3 53 o 
,S » H 
o a 

,J= o 

jg> £ 2 

$ <E a 


ft 

c/3 


J -o 

CL & 

co 

CO 

G 2 
o eg 

3 43 c 
Oh a 2 

3 3 
-g 3 53 

O — to 

.2 EL o 

o 

— Q. o 

111 
§ fi > 

33 


33 

C 

00 

G 

1 


'■& 

03 

G 


O CL 


.§ §S 
a> ■£ *3 
-S o 

o C3 -G 

a c > 

>,•5 -a 

-g ^ c 

cO 3~ 1 ' 1—1 

3 X fe 

o l £ 
03 o* 

0,1, O 

2 o 00 
o "* § 

le E 

> o 
o ?g 33 

o ^ ^ 

43 


£ 

£ 

X 

C3 

43 


G 

G 

-o 

-6 

G 


00 

G 

O 

> 

G 


G _ 
O G 


03 


G 00 

ep3 ° 

£ Cin © 

oo-2 
B ft*. 

G 03 ^ 
33 -S CL 

2 ^ ^ 
G G O 
G O r/D 


■C G ^ 

2 s o 
2 ^ a 

CO 33 03 


03 


5? 

eu 

O 

1= 


8 

& 


G X 
C3 y3 


e 3 •= g 
^3-gS 
° E £ 5 

o > 03 j- 

> 03 co 4—> 

co ^ co CO 

i-g'ii 

.> ,g > 
■o c £ 3 

3 X-S .2 


to 

G .G 

g 2 

O x 3 
O .O 

£1 

w 2 

co 


> u 

'§ ; H. 

• 5 & 
00 ^ 
G G 
O 03 
C 03 
£ 

33 > 
^ G 


-u S5 

G X) 
T o 


o 

33 - 
>r 


o 

& 


D- 

G 

CL 

G 

C/3 


G -G 
G O 
•G J3 
3C 00 

s i 
"8 •£ 
1 "S 

ft ?3 


2 © 


•£ c 




o 

cl 

.1 


I 


X 


O 2 

%£ 
G O 
^ -C 

co 2G 

lg ^ 

ll 

G G 
— CL 

o o 

w CL 


03 

03 

O c 

&£ 
L= — 

<s ^ 

8 2 
CL £ 
co eg 

L 3 G 
O 43 

ii 

O G 


G c2 
G 03 
33 »- 

•G eo 
.> a 

03 S 
1/3 

03 ^ 


-C 


T 


LIMITS OF UNDERSTANDING IN SYSTEMATICS 


Zootaxa 3435 © 2012 Magnolia Press ■ 47 


contrast to character z(0)?’ (applicable to gonochoristic or population, and there was subsequent splitting of that population into two or more 
cross-fertilizing hermaphroditic organisms) populations. 






The process of acquiring causal understanding in biological systematics 


The previous two sections outlined what is required to proceed from sense data to three general classes of 
understanding via inferences of explanatory hypotheses. The last section ended with the observation that some 
authors in biological systematics have developed a protocol that falsely claims to move ultimate understanding 
forward by actions that are not valid test procedures. The nature of this confusion will be addressed in part in this 
section. The subsequent section will examine common approaches to evaluating phylogenetic hypotheses, either in 
the context of testing or stipulating evidential support. 

inferences of ultimate hypotheses. Consider the following common tactic. A systematist examines 
specimens, amassing a variety of ‘morphological,’ histological, reproductive, behavioral observations and/or 
nucleotide sequences. Select properties of observed individuals are described, leading (usually implicitly) to 
inferences of specific hypotheses, colloquially referred to as ‘species descriptions’ (contra Table 1; Nogueira et al. 
2010; Fitzhugh 2008b, 2009, 2010b). More inclusive phylogenetic hypotheses, in the form of cladograms, are often 
included, with some ‘clades’ given formal (supraspecific) names. While typically unstated, the goals of these 
actions are to determine explanatory accounts of differentially shared characters among observed individuals, as 
answers to implicit causal questions (Table 1). Given the regularity with which such ultimate hypotheses as specific 
and phylogenetic are inferred, to what extent has initial understanding of organismal properties been achieved? 
While we can readily identify descriptive understanding of the objects of interest, in tenns of the characters 
instantiated by observed individuals, what is garnered in terms of ultimate understanding with specific and 
phylogenetic hypotheses is usually nothing more than vague answers to questions regarding shared characters 
(Fitzhugh 2009). Species-level hypotheses are rarely referred to as providing explanatory accounts of particular 
characters. And when such hypotheses are specified as such, they are only in the imprecise sense that characters 
originated and became fixed in an ancestral population. Explicit details regarding the causal factors involved in 
character fixation are not presented. 

Cladograms are equally vague causal accounts, at best implying that particular characters originated by some 
unspecified mechanism(s) and were subsequently fixed among members of an ancestral population by some 
unspecified mechanism(s), followed by at least one population splitting event by some unspecified mechanism)s) 
(Fig. 1). The meager explanatory standings of specific and phylogenetic hypotheses are consequences of the fact 
that neither type of inference is made on the basis of detailed theories regarding character origin, fixation, and/or 
population splitting (Fitzhugh 2006a, 2009). Overall, while we can identify answers to causal questions that 
provide initial descriptive, proximate, and ultimate understanding (Table 1, Figs. 1-2), it is the latter that is the least 
detailed in terms of conveying causal structure. 

Testing and support issues. From the basic outline of systematics practice just presented, ranging from 
observation statements to abductive inferences of specific and phylogenetic hypotheses, there is the subsequent 
matter of judging the explanatory merits of these hypotheses as matters of both assessing and extending causal 
understanding. Especially with the advent of cladistics, the emphasis on testing or garnering evidential support has 
been almost exclusively directed toward phylogenetic as opposed to intraspecific or specific hypotheses, so it will 
be with this fonner class of ultimate causation that hypothesis evaluation will be examined here. 

The subject of evidential support for phylogenetic hypotheses has received substantial attention, whether under 
the heading of hypothesis testing or techniques purported to measure support, e.g. Bremer support or various 
resampling protocols. But support in relation to cladograms has two quite different connotations. 3 In one sense it 
relates to abductively inferred hypotheses ([1]) and in another to subsequent testing by induction ([3]). More 
generally, and regardless of mode of inference, support refers to relations between premises and conclusion(s) 
(Longino 1979; Salmon 1984b; Achinstein 2001). This means a distinction can be made between the support for 
initial understanding that is the product of abduction, as opposed to support for assessing, expanding, or revising 
that understanding as consequences of testing via induction (Hanson 1958; Norton 2003). 


3. There is a third connotation, sometimes applied by advocates who regard cladograms as ahistorical, non-causal (and thus 
non-explanatory) diagrams. For instance, Brower (2011: 447; see also Brower 2006, 2010; Turjak & Trontelj 2012) refers 
to cladogram support as “a measure of the relative quantity of evidence favoring a hypothesis of relationships \sic\, not a 
measure of whether or not the hypothesis corresponds to the actual pattern of historical cladogenesis of the taxon in ques¬ 
tion....” Since such a view has no relation to the pursuit of causal understanding that is the goal of scientific inquiry, there 
is no need to give it consideration in the context of hypothesis support addressed in this paper. 
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Referring to the schematic outline of abduction in [1], the understanding afforded is simply a matter of 
conjoining as completely as possible observed effects in need of explanation with some theory(ies). Regardless of 
the causal depth stipulated by the theory, the initial understanding provided in the conclusion is no more than a 
tentative causal accounting. Coupled with the fact that such inferences should apply a given theory as fully as 
possible to effects, thus maximizing explanations of effects as instances of the conjoined theory, considerations of 
support for cladograms are largely trivial (cf. Schurz 2008). Indeed, applying a theory as fully as possible to 
observed effects has the default consequence of maximizing support, i.e. explaining those effects as completely as 
possible as instances of the theory. As the initial understanding conveyed by cladograms is quite meager, tallying 
support for the hypotheses they imply can be performed directly. Consider the example in Fig. 3. The premises 
comprise at a minimum relevant theories (and attendant background knowledge) and observed effects (Fig. 3A). In 
this instance a generic ‘descent with modification/common ancestry’ theory (actually several theories) would be 
one entailing novel character origin and fixation in ancestral populations, and subsequent population splittings (a 
fuller explication is given below in [10] in Increasing causal understanding—testing as intended): 

[4] If character x(0) exists among individuals of a reproductively isolated, gonochoristic or cross- 
fertilizing hennaphroditic population and character x(l) originates by mechanisms a, b, c... n, and 
becomes fixed within the population by mechanisms d, e, f... n (=ancestral species hypothesis), 
followed by event(s) g, h, i... n, wherein the population is divided into two or more reproductively 
isolated populations, then individuals to which descendant species hypotheses refer would exhibit x(l). 

While the theory in [4] is one of strict common cause, two common causes are referenced: proximate character 
origin/fixation and distal population splitting. Both classes of events are required given the diagrammatic 
representations offered by cladograms. Invoking [4] follows from the causal questions one explicitly or implicitly 
asks regarding shared characters among individuals to which two or more specific-level hypotheses apply (cf. 
Table 1; Fitzhugh 2006a, 2006b, 2008b, 2008c, 2010a). Indeed, these questions are codified in data matrices by 
way of the inclusion of outgroups (taxon ‘A’ in this example; Fitzhugh 2006c), and if the intent is to explain the 
fidelity of one’s observations, then conjoining the theory in [4] with observations of shared characters will offer 
explanatory accounts that maintain that fidelity to the greatest extent possible (Fitzhugh 2006a). There is the 
alternative view subsumed under the phrases ‘maximum likelihood’ (Felsenstein 1981, 2004; Swofford et al. 1996; 
Huelsenbeck & Crandall 1997; Haber 2011) and ‘Bayesian’ (Huelsenbeck et al. 2001; Huelsenbeck & Ronquist 
2001; Archibald et al. 2003; Ronquist et al. 2009) that asserts that common ancestry should be considered in 
conjunction with stochastic rates of character change and ‘branch lengths.’ The failure of likelihood and Bayesian 
approaches in the context of phylogenetic inference is that neither considers the relations between the causal 
questions represented in a data matrix and the abduction of explanatory hypotheses. The concept of likelihood is 
superfluous to abduction since inferred hypotheses automatically accord the highest probability on the character 
data being explained, as consequences of the conjunctions of theory(ies) with observed effects (cf. [lj; see example 
in Increasing ultimate causal understanding—testing as intended). The attendant argument from statistical 
consistency that has been used to promote ‘maximum likelihood’ methods is meaningless for abductive reasoning. 
As is the case with statistics, which pertains to induction sensu stricto, consistency is only relevant to hypothesis 
testing, under the view that such testing will ‘in the long run’ eventually lead to ‘true’ hypotheses. Consistency has 
no relevance for abduction, and by extension phylogenetic inference (Peirce 1901, 1932: 2.777, emphasis added): 

[Abduction] is the only kind of reasoning which supplies new ideas, the only kind which is, in this sense, 
synthetic. Induction is justified as a method which must in the long run lead up to the truth, and that, by 
gradual modification of the actual conclusion. There is no such warrant for [abduction]. The hypothesis 
which it problematically concludes is frequently utterly wrong itself, and even the method need not ever 
lead to the truth; for it may be that the features of the phenomena which it aims to explain have no 
rational explanation at all. Its only justification is that its method is the only way in which there can be 
any hope of attaining a rational explanation. 


The Bayesian perspective is misdirected because the ‘evidence’ of interest in phylogenetic inference is not test evi¬ 
dence. The protracted emphasis on ‘optimality criteria’ in phylogenetic inference, especially parsimony versus 
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likelihood, has needlessly detracted from the more salient issue of stipulating the appropriate theory relative to 
observations in need of being explained (Fitzhugh 2006a). 
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A(B-H), i.e. two hypotheses: 

AB(C-H), i.e. two hypotheses: 
ABC(D-H), i.e. five hypotheses: 

ABCD(E-H), i.e. two ad hoc hypotheses: 
ABCDE(F-H), i.e. two hypotheses: 
ABCDEF(G-H), i.e. four hypotheses: 

H, i.e. one ad hoc hypothesis: 
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FIGURE 3. Relations between the (A) premises and (B) conclusion of an abductive inference to phylogenetic hypotheses, and 
(C) the direct indication of the abductive support for those hypotheses. Note that such support refers not to ‘branch support,’ but 
rather the actual support for the two classes of hypotheses implied by cladograms, i.e. character origin/fixation [A(0) —► X(l)] 
and subsequent population splitting events (sj. See text for discussion. 
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Applying the theory in [4] to effects leads to a minimum of two classes of hypotheses being implied by a 
cladogram: character origin/fixation and population splitting events. Distinguishing these hypotheses is often 
overlooked in lieu of only indicating character changes on branches (Fig. 3B), which is consistent with emphasis 
on support being characterized as ‘branch’ or ‘group support’ and the view that quantification (e.g. via indirect 
methods like resampling or Bremer support, see below) of such ‘support’ has some sort of relevance, largely under 
the misconception that the effects being explained by theory, i.e. character data, are relevant to assessing support in 
terms of the pursuit of understanding. But branch/group support is not equivalent to actually indicating support in 
terms of relations between premises and conclusion(s), and any assessment of understanding comes not from the 
premises used to infer hypotheses, but rather as consequences of testing sensu [3]. Regardless, the matter of 
documenting initial understanding provided by cladograms is uncomplicated and trivial. It is uncomplicated 
because associating particular premises to individual hypotheses implied by a cladogram is simple to document. It 
is trivial for the fact that the act of applying the theory as fully as possible to effects will result in the simplest (i.e. 
most parsimonious and with greatest likelihood) thus best supported hypothesis(es), albeit the explanatory scope of 
a cladogram is quite meager given the theory applied. But in such an instance, to say a hypothesis is best supported 
is only to acknowledge that effects are explained as fully as possible as instances of the applied theory(ies). Rather 
than branch or group support, the actual support for the hypotheses in Fig. 3B are summarized in Fig. 3C. The 
cladogram in Fig. 3B implies 18 hypotheses—12 regarding character origin/fixation and six for population splitting 
events. Among these hypotheses, three are ad hoc, as their inference goes beyond the stipulated theory in [4], 
Support for all hypotheses is represented by the conjunctions of theory, indicated in the premises or ad hoc, and the 
particular characters being explained. 

Acknowledging that character data plus theory abductively lead to explanatory hypotheses (Fig. 3A-B), it is 
necessary to recognize that referring to support for those hypotheses is not conveyed by branching diagrams. In 
other words, support is not in terms of relations between premises and cladograms, but rather premises and 
individual hypotheses of character origin/fixation and population splitting events that are implied by cladograms. 
For instance, it would be incorrect to say that ‘group’ (DEFGH) is better supported than (EFGH) because the 
branch subtending (DEFGH) shows more (non -ad hoc ) character changes than does the branch for (EFGH) (Fig. 
3B). There are instead five equally supported hypotheses implied by (DEFGH) and two implied by (EFGH) (Fig. 
3C). Support is not considered among groups or clades but rather among the various hypotheses implied by those 
groups/clades. 

While abductive support for the hypotheses implied by a cladogram are maximized by the extent to which ad 
hoc hypotheses are avoided, as a matter of the conjunction of theory and characters, such support is unremarkable 
in that it is necessitated by the premises. Note as well that this support is not relevant to assessing the causal 
conditions implied by the cladogram or stated in the hypotheses. Actual evaluative support for those causal 
conditions would have to come from testing, which requires evidence well beyond character data (cf. Increasing 
causal understanding—testing as intended). These considerations have implications for indirect measures of 
support garnered by resampling or Bremer support methods, discussed later. 

Considerations of support in terms of test evidence has special significance for cladograms, given the emphasis 
on testing that has been associated with the development of phylogenetic systematics (cf. Increasing causal 
understanding—testing as intended). Identifying support for hypotheses presented in cladograms require 
evidence of the form shown in the premises in [3], especially ‘actual confirming/disconfinning evidence,’ which 
allows for either supporting a hypothesis or suggesting alternatives. A cursory comparison of abductive and 
inductive inferences in [1] and [3], respectively, indicates that the class of evidence used to support the initial 
inferences of hypotheses asserting particular causal events is not the same as the class of evidence required as tests 
of those hypotheses (Fitzhugh 2006a, 2008a, 2010a). 

Testing a la Popper. A perspective begun in the 1970’s, continuing to the present, is that evolutionary 
hypotheses in the form of cladograms are routinely tested as a consequence of the introduction of new characters. It 
was remarked in the previous section that this was a view held by Hennig (1966), but it was the attempted 
association of Karl Popper’s (e.g. 1959) writings on testing that placed the greatest emphasis on equating character 
data with test evidence (e.g. Wiley 1975; Gaffney 1979; Eldredge & Cracraft 1980; Wiley 1981; Rieppel 1988; 
Faith & Cranston 1992; Kluge 1997a, b, 1999, 2001; Grandcolas et al. 1997; Siddall & Kluge 1997; Wenzel 1997; 
Schuh 2000; de Queiroz & Poe 2001, 2003; Farris et al. 2001; Faith & Trueman 2001; Faith 2004, 2006; Schuh & 
Brower 2009; Jenner 2003; de Queiroz 2004; Franz 2005; Helfenbein & DeSalle 2005; Wagele 2005; Egan 2006; 
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Grant & Kluge 2008; Wheeler 2004, 2010; Brower 2011; Faith et al. 2011; Rindal & Brower 2011; Flovenkamp 
2012; cf. Fitzhugh 2006a, Vogt 2008 for critical overviews; see also Sober & Steel 2002, and Sober 2008 for 
similar, yet non-Popperian perspectives). The purported process of Popperian testing in systematics is as follows. 
With the introduction of new characters, ‘predicted’ or otherwise, and their integration into an existing data matrix, 
a new round of cladogram inference is performed. If the topology/topologies of old and new cladograms are the 
same, it is claimed that this is an instance of corroboration (sensu Popper), whereas differences between topologies 
would mean the earlier cladogram(s) has been falsified. For example, given the observations 0011 and 0111 among 
individuals to which respective species hypotheses a-us, b-us, c-us, and d-us apply, the inferred cladogram would 
be ( a-us ( b-us {c-us, d-us))). Additional characters, 0101 and 0101, are subsequently observed and a new 
cladogram inferred, {a-us {c-us {b-us, d-us))). The standard position is that {a-us {b-us {c-us, d-us))) has been 
falsified in lieu of {a-us {c-us {b-us, d-us))). 

There are fundamental problems with this approach. The relation to Popper, much less testing in any 
hypothetico-deductive sense (e.g. [2], [3]), is entirely illusory. While only infrequently speaking of the testing of 
hypotheses as opposed to theories. Popper (1962: 241; 1966: 260-269, 362-364; 1983: 192-193, 349-352; 1992: 
132-134; 1994: 124, 133) held the already articulated view (e.g. Peirce 1932: 2.776; 1958a: 7.182, 7.206; Haack in 
Flaack & Kolenda 1977: 69; Brent 1998: 117; see also Fitzhugh 2006a, 2010a) that test evidence must be 
consequences not only aligned as closely as possible with the causal conditions stated in the hypothesis, but those 
consequences must be of a variety different from and independent of the effects the hypothesis explains (Cleland 
2001, 2002, 2011b). Character data cannot serve as test evidence of hypotheses intended to causally account for 
those effects. This confusion between evidence used to (abductively—[1]) infer a hypothesis and evidence 
(deductively or inductively—[2], [3]) inferred for the purpose of subsequently testing (by way of induction) that 
hypothesis is what Lipton (1991, 2004, 2005; see also Maher 1988; Mayo 1996; Achinstein 2001; Fitzhugh 2010a; 
Cleland 2011b) characterized as accommodation versus prediction. A phylogenetic hypothesis is inferred to 
accommodate character data. In turn those data offer no opportunity to critically assess the causal conditions 
inferred to explain those data. Only relying on accommodated data makes it too easy to claim support for a 
hypothesis, with no inherent risk of refutation. It is not the case that {a-us {b-us {c-us, d-us))) has been falsified/ 
disconfirmed relative to {a-us {c-us {b-us, d-us))) by the inclusion of new observations of characters. Indeed, no test 
has been performed. Observations of new characters cannot be validly deduced or predicted from an existing 
topology, such that those characters are consequences relevant to the assessment of the causal conditions asserted 
by the hypothesis (Sober 1988; Fitzhugh 2005a, 2006a, 2008a, 2010a; Vogt 2008). Since testing pertains to 
assessing causal claims, the relevant test observations would have to be effects that are as closely related as 
possible to the nuances of the conditions expounded in each hypothesis. Cladogram {a-us {b-us {c-us, d-us))), as a 
causal account, only has explanatory relevance to 0011 and 0111. The causal conditions presented in the 
hypothesis, albeit extremely vague, do not lend themselves to making predictions of other characters. Second, the 
inferences that led to {a-us {b-us {c-us, d-us))) and {a-us {c-us {b-us, d-us))) are both abductive, and as such, the 
hypotheses have no evaluative capacity relative to one another. At best one can say that {a-us {b-us {c-us, d-us))) 
has been replaced by {a-us {c-us {b-us, d-us))) for the fact that the explanations of new observations 0101 and 0101 
have relevance to the explanations of old observations 0011 and 0111 (see next section). Proceeding from the first 
inference to the next has resulted in no net positive or negative change in understanding. Rather, the two 
hypotheses are equivalent in that both only provide initial, equally vague answers to different sets of causal 
questions. 

Testing via disjunct hypotheses. Related to the misconception that character data alone serve as tests of 
phylogenetic hypotheses, there is the popular tendency to compare hypotheses inferred from different sets of data, 
most commonly ‘morphology’ versus nucleotide sequences (e.g. Asher et al. 2003; Asher et al. 2008; Chen et al. 
2003; von Dohlen et al. 2006; Crespo et al. 2007; Springer et al. 2007; Bourlat et al. 2008; Dunn et al. 2008; 
Prasad et al. 2008; Bailey et al. 2010; Regier et al. 2010; Meredith et al. 2011; Philippe 2011; Rota-Stabelli et al. 
2011; Vila et al. 2011; Crawford et al. 2012; see also examples discussed in Mooi & Gill 2010). The reasoning here 
is that congruence among topologies inferred from ‘independent’ data sets provides a measure of support or 
corroboration [.s;c] (e.g. Lienau & DeSalle 2010) for the overall ‘phylogeny’ of a group of organisms. There is, 
however, a two-fold problem with this approach. First, comparing cladograms inferred from sets of data that are 
explanatorily relevant to one another violates one of the basic tenets of rational reasoning—the requirement of total 
evidence (RTE). The RTE stipulates that if evidence has relevance, positive or negative, to the support for a 
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particular conclusion, then that evidence must be taken into consideration as part of the premises used to infer that 
conclusion (Carnap 1950; Barker 1957; Hempel 1962, 1965, 1966, 2001; Salmon 1967, 1984a, 1984b, 1989, 1998; 
Sober 1975; Fetzer 1993; Fetzer & Almeder 1993; Lecointre & Deleporte 2005; Fitzhugh, 2006a, 2006b). While 
regarded as necessary to engage in rational non-deductive reasoning (the requirement is automatically satisfied in 
deduction), the RTE has largely been either misconstrued, pro or con (Eernisse & Kluge 1993; Kluge & Wolf 1993; 
Nixon & Carpenter 1996; Kluge 1989, 1998, 2004; Rieppel 2003a; Lecointre & Deleporte 2005), or ignored (Bull 
et al. 1993; de Queiroz 1993; Miyamoto & Fitch 1995; Levasseur & Lapointe 2001) in the systematics literature. 
Oddly, the RTE is often equated with Popper’s notion of corroboration (Nixon & Carpenter 1996; Kluge 2004; 
Lienau & DeSalle 2010; cf. Fitzhugh 2006b) or verificationism (Bucknam et al. 2006), when in fact the principle 
transcends all inferential practices. Simply applying the RTE is not tantamount to testing. Regardless, systematists 
routinely speak of topological similarities and differences between cladograms inferred from different sets of data. 
Yet just such a maneuver that violates the RTE also indicates the evidential relevance of those data to one another 
for the sake of causally accounting for their occurrences. Ignoring the RTE in such instances provides no basis for 
critically assessing understanding from the perspective of testing. Rather it is the explanatory worthiness of 
cladograms, meager as it is, that is scarified in the name of a version (in name only) of testing that is just as 
distorted and vacuous as equating ‘total evidence analyses’ with testing. 

The most apparent consequence of denying the RTE in biological systematics research has been the view that 
disparate cladograms can be evaluated against one another, with congruence between topologies offering empirical 
support for a ‘phytogeny.’ Note however that in the context of cladograms, the term phytogeny refers to the sum 
total of causal events explaining relevant properties of organisms (cf. Table 1, Fig. 1). Drawing comparisons of 
branch arrangements between cladograms inferred from different data sets is, by definition, an exercise divorced 
from phylogeny. Such comparisons are without epistemic merit. Consider the following sets of data and inferred 
explanatory hypotheses: 

[5] (a) • auxiliary theory(ies) (b) • auxiliary theory(ies) 

• phylogenetic theory(ies) A, B, C,...n • phylogenetic theory(ies) A, B, C,...n 

• ‘morphology’ data setX • nucleotide data set Y 


• ( a-us ( b-us ( c-us , d-us ))) • ( a-us ( b-us ( c-us , d-us ))). 

Both sets of inferences lead to explanatory hypotheses that provide some degree of initial causal understanding 
of the respective sets of observations. Does the ‘congruence’ between these topologies provide evidential support 
beyond what is initially offered by the separate sets of premises? In other words, do mere similarities in branch 
arrangements offer assessments of our ultimate causal understanding of observations? No. The two hypotheses, as 
explanatory constructs, have no relevant meaning to one another. Each provides vague causal accountings, per the 
theories applied in the inferences, to their respective sets of data. What are relevant, per the RTE, are the 
observations in need of being explained. And that relevance obviates separate inferences. That relevance is all the 
more apparent from the fact that intemodal branches, nodes, and terminal branches of disparate cladograms 
represent the same specifiable classes of past causal events (Fitzhugh 2009: fig. 19), i.e. novel character origin/ 
fixation followed by population splitting. 

It may be asserted that as the inferences in [5] are products of ‘independent’ data sets, congruence of results 
offers positive support (e.g. Rota-Stabelli et al. 2010). For instance Chen et al. (2003: 264, emphasis original) 
claim. 

The congruence of inferences separately drawn from independent data is considered as strong indicator 
of reliability. If we keep in mind the fact that molecular homoplasy may have different effects on tree 
reconstruction from one gene to another, obtaining the same clade from separate analysis of several 
genes despite this fact renders the clade even more reliable. In other words, obtaining the same tree or 
even some common clades means that there is a common structure in these data sets that must come 
from common evolutionary history. 

While such a notion of independence of evidence might appear consistent with what is actually required for 
testing (e.g. Popper 1992; Cleland 2001, 2002, 2011a; Fitzhugh 2006a, 2010a), the similarity is simply a 
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consequence of the misuse of terms. Claiming, for instance, that ‘morphological’ characters are ‘independent’ of 
nucleotides might make sense if one is segregating these features according to particular criteria for the sake of 
establishing a classificatory arrangement of observations, e.g. ‘cellular’ as opposed to ‘molecular.’ But 
independence qua classes does not automatically translate into independence of test evidence. An equally serious 
consequence is that for results from one abductive inference to provide a basis for judging the veracity of results 
from a separate inference requires some extra-evidential criterion for weighing one hypothesis against another. A 
typical determining factor is, as alluded to in the quote from Chen et al. (2003) the a priori perceived problem of 
homoplasy. If one class of data is deemed to have an inordinate amount of homoplasy this could lead to ‘incorrect’ 
results. Partitioning data and comparing the separate cladogram topologies is claimed to allow one to judge 
reliability of the ‘overall phylogeny.’ What makes this an extra-evidential criterion is that one must impose 
hypotheses of homoplasy prior to even inferring such hypotheses from the data at hand. But such a criterion is 
erroneous because homoplasy is a class of ad hoc hypothesis that is the product of the abductive inference of 
phylogenetic hypotheses (Fitzhugh 2006a, 2006b). Asserting homoplasy prior to such inferences is nonsensical as 
it does nothing more than reduce one to concluding that what they perceive as the same characters among a group 
of organisms are not the same. This is not a matter of homoplasy, but rather the fact that one either cannot trust their 
own basic cognitive abilities or that they have already explained their observations. In the case of the latter, the 
‘same’ characters should immediately be regarded as different. It is only subsequent to this step that one would 
then engage in phylogenetic inference. If such an extra-evidential criterion did exist, and it does not, it would have 
to come into consideration prior to making the inferences, again as a simple matter of evidential relevance. 
Comparing phylogenetic hypotheses for partitioned sets of relevant data is both irrational and counter to 
subsequently assessing scientific understanding (Fitzhugh 2006a, 2006b, 2008c). 

Related to the situation outlined in [5], there is the alternate approach of taking partitioned data sets and 
applying different theories in the inferences of phylogenetic hypotheses. For example, 


[6] (a) • auxiliary theory(ies) (b) 

• phylogenetic theory(ies) A, B, C,... n 

• ‘morphology’ data setX 


• auxiliary theory(ies) 

• phylogenetic theory(ies) J, K, L,... n 

• nucleotide data set Y 


• ( a-us (b-us ( c-us, d-us ))) 


• ( a-us ( b-us (c-us, d-us))). 


Usually poorly articulated, and questionable in their justification (cf. Fitzhugh 2006a, 2006b), the common 
phylogenetic ‘theories’ include what are referred to as parsimony, maximum likelihood, and Bayesian. As with the 
previous example, it is customary to conclude that congruence between topologies offers some measure of support. 
The inherent problem that precludes such a conclusion, beyond violating the RTE and the specious argument from 
independence, is that one must assume that cladograms represent something beyond the causal conditions they 
imply per the theories used in their inference. Otherwise, to draw comparisons between cladograms that connote 
different causal parameters is a meaningless exercise. As the only scientifically viable way to interpret cladograms 
is that they are sets of vague explanatory hypotheses (Fig. 3B), comparisons of cladograms/hypotheses inferred 
from different theories is not a surrogate for testing and can provide no empirical critique of any degree of ultimate 
understanding. 

Finally, it might be argued that inferences of disjunct phylogenetic hypotheses using partitioned data sets are 
consistent with William Whewell’s (1847) ‘consilience of inductions.’ The commonly referred to characterization 
of this doctrine comes from volume two of Whewell’s (1847: 469, emphasis original) Philosophy of the Inductive 
Sciences, aphorism XIV: “The Consilience of Inductions takes place when an Induction, obtained from one class of 
facts, coincides with an Induction, obtained from another different class. This Consilience is a test of the truth of 
the Theory in which it occurs.” But as Laudan (1981: 165, emphasis original) noted in his analysis of Whewell’s 
writings on the subject, the principle is implemented under circumstances not always related to testing: 

(1) When an hypothesis is capable of explaining two (or more) known classes of facts (or laws); 

(2) When an hypothesis can successfully predict “cases of a kind different from those which were con¬ 
templated in the formation of our hypothesis;” 
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(3) When an hypothesis can successfully predict or explain the occurrence of phenomena which, on the 
basis of our background knowledge, we would not have expected to occur. 

Laudan (1981: 166) pointed out that (1) is just a consequence of compiling relevant data for abductive inferences, 
much along the lines of what is stipulated by the RTE. It is an action of “...formal unification or simplification of 
our theories and hypotheses. By reducing two classes of phenomena—which had hitherto required separate and 
(seemingly) independent hypotheses or theories for their explanation—to one general hypothesis or theory,” and 
“achieves a reduction in the theoretical baggage required to ‘carry’ the known phenomena.” The circumstances sur¬ 
rounding (2) and (3) are more crucial in that these are actions leading to increased empirical content and under¬ 
standing as consequences of testing per [2] and [3] (cf. Increasing causal understanding—testing, as intended, 
below). WhewelTs consilience of inductions cannot justify the inferences of disparate phylogenetic hypotheses 
from partitioned sets of data, much less purported empirical comparisons of those hypotheses as matters of testing. 
Circumstance (1) simply brings together relevant empirical content for the initial purpose of inferring an explana¬ 
tory account. And just as was noted in the previous section, Testing a la Popper, (1) is not a surrogate for testing. 

Resampling methods. Attempts to gamer support [sic\ for phylogenetic hypotheses have also come from the 
adoption of procedures such as the bootstrap (Felsenstein 1985, 2004; Efron 1979; Efron & Tibshirani 1993; Efron 
et al. 1996; Elolmes 2003; Soltis & Soltis 2003), jackknife (Farris et al. 1996; Miller 2003), and permutation tests 
(Faith & Cranston 1991; cf. Egan 2006 for a review of all of these approaches). These methods were originally 
developed to test statistical hypotheses through a process of random resampling, with or without replacement 
depending on the method, from among members of an original sample distribution to determine confidence 
intervals on a population parameter. Applications of these methods to phylogenetic hypotheses occur by randomly 
sampling characters from an original data matrix to create contrived data matrices of the original dimensions, from 
which new cladograms are inferred. The frequencies of groups or clades occurring among the cladograms are 
compared to groups/clades present in the original cladogram(s). The idea is that the more frequent the occurrences 
of contrived clades identical (in topology only) to those in the original cladogram(s) being ‘tested,’ the greater the 
support accorded those clades. 

There are several problems associated with attempting to claim support for phylogenetic hypotheses by the use 
of resampling methods (cf. discussion in Fitzhugh 2006a). The first is that these methods are intended for testing 
statistical, not explanatory hypotheses (but see Farris 2002, and Golobofif et al. 2003, for views that resampling 
does not require statistical assumptions). Statistical hypotheses characterize the properties of a class or population, 
while explanatory hypotheses offer past causal conditions or events that account for specific, present effects. The 
distinction with regard to testing is the nature of the test evidence. Statistical hypotheses rely on test evidence that 
is the same class of effects from which hypotheses are inferred. Effects that serve as test evidence for explanatory 
hypotheses are independent of the class of effects from which hypotheses are inferred, such that that evidence con¬ 
sists of effects that are related as narrowly as possible to the hypothesized causal events/conditions, thus having the 
lowest probability of occurrence if hypothesized causal conditions did not occur (cf. [2], [3]; see also Descriptive, 
proximate, and ultimate understanding; Testing a la Popper; Increasing causal understanding—testing as 
intended). The application of resampling methods to evaluate the initial, abductive understanding afforded by phy¬ 
logenetic hypotheses fails for much the same reason that the addition of new character data cannot serve as test evi¬ 
dence (see Testing a la Popper)—character data of organisms are not relevant test evidence to judge the veracity 
of the hypotheses of causal conditions inferred to explain those data. A second notable problem is related to what 
was outlined earlier (Testing and support issues) regarding the distinction between abductive support for hypoth¬ 
eses versus support by way of testing. What is at issue when speaking of support for a cladogram is not topological 
groups, but rather the separate hypotheses of character origin/fixation and population splitting events implied by 
cladograms (Fig. 3). As one cannot establish that a ‘clade’ or ‘group’ inferred from a resampling procedure refers 
to the empirically identical phylogenetic hypotheses (cladogram) being evaluated, any comparisons are between 
nothing more than branching diagrams, not clades-as-composite-hypotheses. As was noted under Testing and sup¬ 
port issues, abductive support for hypotheses is immediately constrained by the conjunctions of theory and effects 
(cf. [1]). And while that support can be directly tabulated (Fig. 3), it is not the requisite support needed to evaluate 
the causal claims among those hypotheses. 

Bremer ‘support’. A fourth type of technique to determine hypothesis support is the Bremer support analysis 
or decay index (Bremer 1988, 1994; Davis 1995) 4 . As with resampling methods discussed earlier, the support to 
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which Bremer refers is individual clades in a cladogram, detennined by how many extra steps are required on a 
cladogram to eliminate each clade. The idea is that the larger the number of steps required to alter topologies the 
better supported is the overall cladogram. Fitzhugh (2006a: 100, emphasis added) pointed out that Bremer support 
offers no semblance of support for phylogenetic hypotheses: 

The examination of cladograms of greater length for the eventual ‘collapse’ of what is interpreted as the 
‘same clade’ cannot have any empirical meaning regarding the actual evidential support that allows for 
that clade in the original hypothesis being ‘evaluated.’ Flypotheses of greater length stand on their own as 
independent causal accounts that are derived from premises that differ from those used to infer a ‘mini¬ 
mum-length’ cladogram. The difference in premises would have to refer to the interpretations of at least 
some shared similarities as not being the same characters, which would imply a different set of causal 
questions from what were originally asked. The overall consequence is that the exercise of comparing 
these hypotheses with the hypothesis in question, much less noting the number of steps required to col¬ 
lapse clades of that hypothesis, cannot provide the indication of support that proponents have suggested. 

As with the other purported test or evaluative procedures outlined so far in this section, Bremer support is nothing 
but an exercise in ‘branch manipulation.’ It has no relation to the observations that resulted in a hypothesis, much 
less assessing the underlying causal events it implies (contra Grant & Kluge 2007). 

Bremer support has also been used as an argument to combine ‘morphological’ and nucleotide sequence data. 
For instance, in their inferences of phylogenetic hypotheses among placental mammals from partitioned and 
combined data, Lee and Camens (2009: 2244) noted that “when morphology is combined with extensive molecular 
data, morphology increases branch (Bremer) support for every clade in the preferred [sic] tree.” This is a matter of 
incorrectly conflating the requirement of total evidence (RTE) with testing. Bremer support has no relation to, and 
provides no epistemic basis for the RTE. As discussed earlier, the RTE is not an ex post facto criterion. On the 
matter of evidential support for clades, Bremer-support values are detached from reality. As clades are 
diagrammatic representations capable of nothing more than implying hypothesized causal events (Fig. 3), no 
amount of character data can be brought to bear on the subject of support for or against such hypotheses that 
account for those data. Pertinent support, and thus further causal understanding, is garnered as a consequence of 
proper testing. 

Increasing ultimate causal understanding—testing as intended. The relation between observed characters 
of organisms and specific/phylogenetic hypotheses is one of effects being explained by ultimate causes. This is a 
direct consequence of the overarching goal of scientific inquiry, i.e. to pursue causal understanding by way of 
explanations of observed effects and predictions of future phenomena (Hempel 1965: 139). It is this relation that 
establishes what would be required to test specific and phylogenetic hypotheses. A contrived example presented by 
Fitzhugh (2010a) can be used to schematically outline this process. To begin, consider as background knowledge 
that there are known groups of organisms to which specific hypotheses a-us, b-us, etc., have been applied in the 
past. New individuals with unanticipated or surprising characters are subsequently observed (Fig. 4A). These new 
observations lead to three causal questions 4 5 : 

[7] Q{. Why do some of these individuals have a white spot in contrast to completely black? 

Q 2 \ Why do some of these individuals have antennae in contrast to a smooth dorsum? 

(Q{. Why do individuals to which specific hypotheses x-us and y-us apply have ventral appendages? 

Answers to questions Q ] and Q 2 are provided by way of the abductive inferences of specific hypotheses x-us and y- 
us, respectively. For instance, the inference providing an answer to Q 2 has the form (cf. Fitzhugh 2009): 


4. The assessment in this section also applies to the derivative ‘ratio of explanatory power’ (REP) of Grant and Kluge (2007, 
2008). Contrary to claims by these authors, hypothesis support in the context of phylogenetic inference—given that it is 
abductive—is nothing more than the relation between premises and conclusion(s) (cf. [1]). Making a distinction between 
support and optimality as suggested by Grant and Kluge is gratuitous. 

5. Regardless of observations being ‘morphological’ or nucleotide sequences, the causal questions would be identical in form 
(cf. Fitzhugh 2006a-c). 
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[8] Species Theory: If character x(l) originates by mechanisms a, b, c...n, among gonochoristic or 
cross-fertilizing hermaphroditic individuals of a reproductively isolated population with character 
x(0), and x( I) subsequently becomes fixed throughout the population during tokogeny by 
mechanisms d, n, then individuals observed in the present will exhibit character x(l). 
Observations (effects): Individuals have a dorsal margin with antennae in contrast to a smooth dor¬ 
sal margin as seen among individuals to which other specific hypotheses ( a-us, b-us, etc.) refer. 


Causal Conditions (specific hypothesis y-us): The antemiate dorsal margin condition originated by 
unspecified mechanisms within a reproductively isolated population with smooth dorsal margins 
and eventually became fixed throughout the population during tokogeny by additional unspecified 
mechanisms. 

Answering Q 3 is also by way of abduction, but in this instance to a phylogenetic hypothesis (cf. Fitzhugh 
2009): 

[9] 6 Phylogenetic Theory: If character x(0) exists among individuals of a reproductively isolated, 
gonochoristic or cross-fertilizing hermaphroditic population and character x( I) originates by 
mechanisms a, b, c... n, and becomes fixed within the population by mechanisms d, e, /... n 
(=ancestral species hypothesis), followed by event(s) g, h, i... n, wherein the population is divided 
into two or more reproductively isolated populations, then individuals to which descendant species 
hypotheses refer would exhibit x(l). 

Observations (effects): Individuals to which specific hypotheses x-us and y-us refer have 
ventrolateral margins with appendages in contrast to smooth as seen among individuals to which 
other species hypotheses (a-us, b-us, etc.) refer. 


Causal Conditions (phylogenetic hypothesis X-us): Ventrolateral margin appendages originated 
by some unspecified mechanism(s) within a reproductively isolated population with smooth 
ventrolateral margins, and the appendage condition became fixed in the population by some 
unspecified mechanism(s) (= ancestral species hypothesis), followed by an unspecified event(s) that 
resulted in two or more reproductively isolated populations. 

Although separately inferred as responses to questions Q x -Q v the answers are graphically represented by clado- 
gram (a-us bus (x-us y-us)). Notice the stark contrast between this portrayal of phylogenetic inference and Sober’s 
(2002: 157) characterization: “The first problem is... one infers a tree;... one uses an inferred tree to solve a further 
problem [of ancestral character transformation].” 


6. This inference requires comment, as it departs from standard approaches, i.e. ‘parsimony,’ ‘maximum likelihood,’ ‘Bayes¬ 
ian.’ While there is some resemblance to what is commonly referred to as ‘parsimony analysis,’ the premises are deter¬ 
mined by question Q } , not parsimony. Parsimony is, however, a relevant factor in linking the question to an inference that 
maintains as much as possible the empirical content in the question (Sober 1975). As abductive inference using a common 
cause theory will lead to ‘most parsimonious’ conclusions, the hypothesis is by definition also of maximum likelihood in 
the sense of (abductive) support accorded the hypothesis by the premises (Sober 1988; Fitzhugh 2006a, 2006b). What is 
referred to as ‘maximum likelihood analysis’ (ML) (sensu Felsenstein 1981, 2004; Swofford et al. 1996; Huelsenbeck & 
Crandall 1997), however, is problematic in that it implements a theory (‘model’) that is at odds with phylogenetic-based 
causal questions. ML concerns itself with branch lengths, thus the explanatory scope cannot be phylogenetic, but rather 
specific or intraspecific (Fig. 1; cf. Fitzhugh 2006a, 2006b). ‘Bayesian analysis’ (e.g. Huelsenbeck et al. 2001; Huelsen¬ 
beck & Ronquist 2001; Archibald et al. 2003; Ronquist et al. 2009) is erroneous for the fact that Bayesianism addresses 
determinations of hypothesis acceptance/belief on the basis of test evidence subsequent to hypothesis inference. ‘Bayesian 
analysis’ erroneously estimates phylogenetic hypotheses using posterior probabilities derived from the character data 
explained by those hypotheses. To make such ‘inferences,’ one must rely upon empirically empty cladograms and a theory 
that is inconsistent with relevant causal questions (as in ML), and treat character data as test evidence [sic]. 
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Species hypotheses: 

a-us, b-us, etc. x-us y-us 
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present 



FIGURE 4. Relations between observations (A) leading to specific and phylogenetic hypotheses (B), and the evidence 
required to test those hypotheses (C). Modified from Fitzhugh (2010a: fig. 2). 
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With regard to testing ( a-us bus ( x-us y-us)), we have to acknowledge what hypotheses are involved. If ( a-us 
bus ( x-us y-us)) conveys the products of the above inferences, then a minimum of four hypotheses are candidates 
for testing (Figure 4B). Specific hypotheses x-us and y-us provide respective explanations of characters by way of 
origin and subsequent fixation in ancestral populations (Fig. 4B, h t 2 : selection). The phylogenetic hypothesis 
actually entails at least two hypotheses, origin and subsequent fixation of appendages (Fig. 4B, /z 3a : selection) and 
subsequent population splitting (Fig. 4B, /z 3b : population splitting; cf. Fig. 3). The test evidence required for h t 3 
would have to be respective effects that could be associated as narrowly as possible with each of the causal events 
in the hypotheses (Fig. 4C, e-e 2 , cf. [2], [3]). But this would first necessitate filling out the causal conditions with 
sufficient specifics such that useful predictions of relevant test evidence can be specified. For instance, hypotheses 
/z 3a h , represented by ( a-us bus ( x-us y-us)) (Fig. 4C), might be expanded to state that ventrolateral appendages 
originated within an ancestral population, and that feature conferred a selective advantage in competition for food 
resources, leading to fixation of the character in the population. This modest increase in detail might allow for 
predicting consequences from these conditions that could serve as potential test evidence, e.g. presence of food 
remains indicating a shift in diet associated with individuals with appendages, correlated with increasing frequency 
of remains of individuals with appendages in a particular region. In similar fashion, evidence of a population 
splitting event would require stipulating causal specifics, e.g. vicariance via tectonic events, from which effects as 
narrowly associated as possible with such a class of events might be predicted. Schematically, predictions of 
potential test evidence for /z 3a _ b would have the fonn (cf. [2]), 

[10] Phylogenetic Theory: If character x(0) exists among individuals of a reproductively isolated, 
gonochoristic or cross-fertilizing hermaphroditic population and character x( I) originates by 
mechanisms a, b, c... n, and becomes fixed within the population by mechanisms d, e, /... n 
(=ancestral species hypothesis), followed by event(s) g, h, i... n, wherein the population is divided 
into two or more reproductively isolated populations, then individuals to which descendant species 
hypotheses refer would exhibit x(l). 

Causal Conditions (phylogenetic hypothesis X-us): Ventrolateral margin appendages originated 
by events X v X 2 , X } ,... n within a reproductively isolated population with smooth ventrolateral 
margins, and the appendage condition became fixed in the population via events Y v Y v Y v ... n (= 
ancestral species hypothesis), followed by events Z„ Z 2 , Z 3 ,... n that resulted in two or more 
reproductively isolated populations. 

Original observations (effects): Individuals to which specific hypotheses x-us and y-us refer have 
ventrolateral margins with appendages in contrast to smooth as seen among individuals to which 
other species hypotheses {a-us, b-us, etc.) refer. 

Predicted test consequences: Effects X v , X 2 ,, X y ,... n and Y v , 7,,, 7 3 ,,... n should be observed, 
indicating the causal events of character origin and fixation of appendages, respectively, among 
individuals of an ancestral population (cf. Fig. 4C: /z 3a ), and effect(s) Z,,, Z,,, Z 3 ,,... n should be 
observed, indicating occurrences of causal events resulting in splittings of populations into 
separate, reproductively isolated groups (cf. Fig. 4C, /;, b ). 

Notice that potential test consequences from the selection hypothesis are independent of the effects that prompted 
inference of the hypothesis. While both classes of evidence are inferred to be effects of a common causal event, 
evidence of selective advantages for the presence of appendages lie beyond the mere presence of those appendages. 
This is the independence of evidence referred to by Popper (e.g. 1992: 132-133) regarding effects being explained 
by a hypothesis and effects serving as test evidence for that hypothesis. Within the mechanics of testing specific or 
phylogenetic hypotheses, simply segregating suites of characters into different classes is not tantamount to this type 
of independence. 

Actually carrying out the testing of hypotheses h 2a and /j 3b (Fig. 4C) would have the fonn presented in [3]: 


LIMITS OF UNDERSTANDING IN SYSTEMATICS 


Zootaxa 3435 © 2012 Magnolia Press ■ 59 




[11] Auxiliary theory(ies): Stated as relevant and necessary to the test. 

Phylogenetic Theory: If character x(0) exists among individuals of a reproductively isolated, 
gonochoristic or cross-fertilizing hermaphroditic population and character x( I) originates by 
mechanisms a, b, c... n, and becomes fixed within the population by mechanisms d, e, /... n 
(=ancestral species hypothesis), followed by event(s) g, h, i... n, wherein the population is divided 
into two or more reproductively isolated populations, then individuals to which descendant species 
hypotheses refer would exhibit x( I). 

Actual test conditions: Descriptions of actions taken to enable potential observations of test results. 
Test results: Effects X,,, X 2 ., X y ,... n and Y v , Y r , Y y ,... n are observed, indicating the causal events of 
character origin and fixation of appendages, respectively, among individuals of an ancestral 
population (cf. Fig. 4C: /z 3a ), and efifect(s) Z v , Z 2 ,, Z 3 ,,... n are observed, indicating occurrences of 
causal events resulting in splittings of populations into separate, reproductively isolated groups (cf. 
Fig. 4C, A J. 


Conclusions: Hypotheses /z 3a and /z 3b are confirmed. 

There is, however, the consequence of realizing the actual limitations to testing specific and phylogenetic 
hypotheses—the time that has elapsed between the hypothesized causes and observations of effects in the present 
can severely limit or preclude the existence of test evidence. While testing is open to being potentially 
accomplished, it might not be feasible given inherent constraints. In the absence of both filling out ( a-us bus ( x-us 
y-us )) to the point of providing potential test evidence (cf. [2], [10]) and actually engaging in testing (cf. [3], [11]), 
ultimate causal understanding provided by specific and phylogenetic hypotheses remains rudimentary in that it is 
only the simple conjunctions of theories and effects-to-be-explained (cf. [1], [8], [9]). 


Conclusions 

Scientific understanding occurs by way of explanation through the fitting of observations into some broader 
theoretical framework, not only by offering initial information about possible causes but also through the ability to 
anticipate and investigate consequences related to those causes as matters of critical evaluation. Understanding is 
also context dependent in that it is a state of mind contingent on what individuals regard as being sufficient for 
meeting their standard of understanding. What provides an adequate explanation for one individual might be 
unsatisfactory to another. As a result, some yardstick by which to judge the adequacy of understanding is required. 
Surely any modicum of consensus on the adequacy of hypotheses in the sciences should come from the extent to 
which results from empirical testing are manifested. Therein lay two problems for biological systematics. Ultimate 
hypotheses, especially specific and phylogenetic, are often devoid of causal details, such that the state of 
understanding, beyond the initial explanatory notions presented under the rubric of ‘taxa’ or cladograms are neither 
pursued nor enhanced. Even if these hypotheses are filled out to the point that valid test predictions can be 
stipulated, it is likely that actual testing will be impractical in nearly all instances, as noted earlier. Instead, critical 
assessments of hypotheses are stalled, such that there is the tendency to orient back to original character 
observations, such as interesting correlations among features, or pursuing investigations of the finer structural 
components of characters (descriptive) or their ontogenetic development (proximate). In other words, the general 
reaction is to move backward in an epistemic sense to consider the enhancement of descriptive and proximate 
understanding, rather than actually pursuing ultimate understanding in terms of critical hypothesis assessment. 
Taken at face value, there is nothing wrong with such a maneuver, but we do need to be cognizant that enhancing 
descriptive or proximate understanding does nothing to promote continued ultimate understanding as conceived as 
testing in the sciences. Several of the classes of systematics hypotheses fail to provide substantive growth in causal 
understanding for the fact that our tendency is to maneuver away from testing those hypotheses. 

It is with the advent of nucleotide sequencing that the magnitude of this problem has increased. For instance, in 
what way does an ultimate hypothesis lead to the pursuit of causal understanding of nucleotides in a particular 
sequence? While we can readily associate epistemic consequences of causal questions regarding observations of 
‘morphological’ characters and the cladograms that serve as vague answers (cf. [7]-[ll]), what are the merits to 
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asking questions of the fonn, “Why do I observe a T at position 482 as opposed to A in this string of nucleotides”? 
Indeed, such questions are necessarily implied, just as they are for any other classes of characters, in a data matrix 
(Fitzhugh 2006c). In other words, what understanding is attained by inferring cladograms to answer such questions 
if the accumulation of sequence data cannot promote a process of reexamining and testing descriptive, proximate or 
ultimate understanding? Are such questions at the level of individual nucleotides even appropriate or relevant? The 
routine pattern in ‘molecular’ systematics is to (a) sequence nucleotides, (b) infer cladograms (or groups of 
mutually exclusive cladograms from partitioned data sets and/or contradictory theories), (c) publish cladograms, 
(d) proceed to another project and repeat steps (a)-(c), or (d) perform more sequencing and repeat steps (a)-(c). 
This approach is exemplified by research programs detennining metazoan phylogenetic relationships, e.g. Giribet 
et al. (2000); Halanych (2004); Philippe and Telford (2006); Dunn et al. (2008); Philippe et al. (2009); Schierwater 
et al. (2009); Philippe et al. (2011). With regard to the phylogenetic hypotheses inferred, there is nothing in this 
pattern of activity that enhances evolutionary understanding—neither in terms of the vague explanatory nature of 
the cladograms produced, nor by the fact that proper testing is infeasible. There is the clear indication that causal 
understanding of sequence data is far less important than simply deriving branching diagrams from which one 
might refer to taxa (= explanatory hypotheses) that have been previously characterized using morphology. This is 
not a problem limited to considerations of sequence data. Analogous instances of steps (a)-(d) also can be found 
among groups of organisms with both extensive neontological (morphology, sequence data) and paleontological 
data, e.g. cetacean phylogeny (cf. review by Uhen 2010 for neontological and paleontological morphological 
studies; Gatesy 1998, Montgelard et al. 2007, O’Leary & Gatesy 2008, Spaulding et al. 2009, Xiong et al. 2009 
regarding sequence data). To that end, the view of O’Leary and Gatesy (2008: 400; see also Spaulding et al. 2009: 
1,12) exemplifies this mischaracterization of pursuing understanding, by equating adherence to the requirement of 
total evidence with testing (cf. Testing a la Popper, Testing via disjunct hypotheses): 

Continued synthesis of molecular and morphological data from extant and extinct taxa remains the 
strongest test of phylogenetic hypotheses and the best summary of the common signal in the diverse data 
available for phylogenetics.... 

Mooi and Gill (2010: 27) echo the problem just described: 

Solving character conflict is at the crux of systematics. Conflicting hypotheses of relationship can be 
addressed through: (1) a declaration of one to be true based on our own authority, (2) a re-examination of 
characters supporting each to discover, understand and potentially resolve conflicts, (3) the introduction 
of an additional source of data (either from other character complexes or with different or additional 
taxa) to produce yet another tree, (4) the presentation of the data in a manner where conflict is obscured 
and avoids scrutiny. 

The issue of ‘character conflict’ is, however, a contrived problem, readily solved by correctly applying the 
requirement of total evidence in the act of abductively inferring hypotheses. More substantial is the fact that the 
maneuvers outlined by Mooi and Gill are symptomatic of the erroneous view that ultimate causal understanding in 
systematics can be achieved solely by manipulations of characters, under the headings of testing, Popperian 
corroboration, support, etc. 

In light of the above consequences, we need to acknowledge that biological systematics is a venue for 
increasing ultimate understanding that is inherently limited. The greatest strength of systematics has been as a 
vehicle for prompting one to revert back to considerations of descriptive and proximate understanding, rather than 
actually pushing forward with critically evaluating ultimate understanding through the process of testing. This 
conclusion is not to impugn the importance of systematics, but rather to point out that the perceived productivity of 
systematics research programs, especially those that are specific and phylogenetic, are far more constrained than 
usually assumed. Recognizing these inherent limitations would aid in better streamlining systematics research to 
maximize productivity in the sense of actually shifting causal understanding to more descriptive and proximate 
causal levels. 

My intent in outlining standard approaches in systematics in tenns of Mayr’s (1961) proximate and ultimate 
causes in biology is to highlight the too often unrealized boundaries actually imposed on the field. Ignoring those 
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limitations has resulted in the development of methodological and research pursuits that are at odds with the goal of 
attaining causal understanding as part of scientific inquiry. At its best, systematics enhances descriptive under¬ 
standing, and within limits the pursuit of proximate causal understanding. Where it has been especially remiss is in 
elevating the importance of specific and phylogenetic hypotheses beyond what they usually are—initial, very 
vague explanation sketches—as well as claiming increases in evolutionary understanding where none exists. 
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